Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 95010 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.6 MiB |
| Average record size in memory | 40.0 B |
Variable types
| Numeric | 3 |
|---|---|
| Text | 2 |
Reproduction
| Analysis started | 2024-05-03 15:40:34.407960 |
|---|---|
| Analysis finished | 2024-05-03 15:48:41.746915 |
| Duration | 8 minutes and 7.34 seconds |
| Software version | ydata-profiling vv4.7.0 |
| Download configuration | config.json |
similarity_score
Real number (ℝ)
| Distinct | 90870 |
|---|---|
| Distinct (%) | 95.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.7785381 |
| Minimum | 0.739786 |
|---|---|
| Maximum | 0.87120444 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 742.4 KiB |
Quantile statistics
| Minimum | 0.739786 |
|---|---|
| 5-th percentile | 0.75973005 |
| Q1 | 0.76874564 |
| median | 0.7767409 |
| Q3 | 0.78595968 |
| 95-th percentile | 0.80438816 |
| Maximum | 0.87120444 |
| Range | 0.13141844 |
| Interquartile range (IQR) | 0.017214041 |
Descriptive statistics
| Standard deviation | 0.013764874 |
|---|---|
| Coefficient of variation (CV) | 0.017680411 |
| Kurtosis | 1.2805986 |
| Mean | 0.7785381 |
| Median Absolute Deviation (MAD) | 0.0085044335 |
| Skewness | 0.87431604 |
| Sum | 73968.905 |
| Variance | 0.00018947175 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.7564311041 | 9 | < 0.1% |
| 0.7608817269 | 9 | < 0.1% |
| 0.7711655799 | 9 | < 0.1% |
| 0.7679277082 | 9 | < 0.1% |
| 0.7671090325 | 9 | < 0.1% |
| 0.7670514205 | 9 | < 0.1% |
| 0.7658925468 | 9 | < 0.1% |
| 0.7658477805 | 9 | < 0.1% |
| 0.764914634 | 9 | < 0.1% |
| 0.7646556549 | 9 | < 0.1% |
| Other values (90860) | 94920 |
| Value | Count | Frequency (%) |
| 0.7397860001 | 1 | |
| 0.7401986584 | 1 | |
| 0.7405600195 | 1 | |
| 0.7408847744 | 1 | |
| 0.74121344 | 1 | |
| 0.7413294225 | 1 | |
| 0.7414419418 | 1 | |
| 0.7422464185 | 1 | |
| 0.7423244582 | 1 | |
| 0.7423705559 | 1 |
| Value | Count | Frequency (%) |
| 0.8712044448 | 2 | |
| 0.8695070585 | 1 | |
| 0.8628570823 | 1 | |
| 0.8613146561 | 1 | |
| 0.8602318313 | 2 | |
| 0.8599051395 | 1 | |
| 0.8593465219 | 1 | |
| 0.8578624039 | 2 | |
| 0.8566986522 | 2 | |
| 0.8558077554 | 1 |
skill_id
Real number (ℝ)
| Distinct | 3020 |
|---|---|
| Distinct (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4061.8227 |
| Minimum | 1 |
|---|---|
| Maximum | 8741 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 742.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 382 |
| Q1 | 1930 |
| median | 3842 |
| Q3 | 6193 |
| 95-th percentile | 8347 |
| Maximum | 8741 |
| Range | 8740 |
| Interquartile range (IQR) | 4263 |
Descriptive statistics
| Standard deviation | 2514.3955 |
|---|---|
| Coefficient of variation (CV) | 0.61903133 |
| Kurtosis | -1.0958914 |
| Mean | 4061.8227 |
| Median Absolute Deviation (MAD) | 2006 |
| Skewness | 0.22047571 |
| Sum | 3.8591378 × 108 |
| Variance | 6322184.9 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 890 | 2080 | 2.2% |
| 521 | 982 | 1.0% |
| 7550 | 885 | 0.9% |
| 6232 | 797 | 0.8% |
| 2715 | 710 | 0.7% |
| 4290 | 705 | 0.7% |
| 4254 | 674 | 0.7% |
| 5484 | 625 | 0.7% |
| 6707 | 569 | 0.6% |
| 148 | 556 | 0.6% |
| Other values (3010) | 86427 |
| Value | Count | Frequency (%) |
| 1 | 2 | < 0.1% |
| 3 | 1 | < 0.1% |
| 7 | 5 | < 0.1% |
| 8 | 34 | |
| 9 | 8 | < 0.1% |
| 10 | 6 | < 0.1% |
| 20 | 4 | < 0.1% |
| 22 | 1 | < 0.1% |
| 25 | 22 | |
| 29 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 8741 | 1 | < 0.1% |
| 8737 | 2 | < 0.1% |
| 8727 | 13 | < 0.1% |
| 8722 | 50 | |
| 8721 | 1 | < 0.1% |
| 8720 | 74 | |
| 8712 | 3 | < 0.1% |
| 8703 | 41 | |
| 8697 | 96 | |
| 8694 | 21 | < 0.1% |
skill_name
Text
| Distinct | 3020 |
|---|---|
| Distinct (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 742.4 KiB |
Length
| Max length | 100 |
|---|---|
| Median length | 70 |
| Mean length | 33.714662 |
| Min length | 3 |
Characters and Unicode
| Total characters | 3203230 |
|---|---|
| Distinct characters | 77 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 658 ? |
|---|---|
| Unique (%) | 0.7% |
Sample
| 1st row | Oracle HRIS |
|---|---|
| 2nd row | Human resources management system HRMS |
| 3rd row | OrangeHRM |
| 4th row | Human resource management software HRMS |
| 5th row | Human resource information system HRIS |
| Value | Count | Frequency (%) |
| software | 15680 | 4.1% |
| management | 13005 | 3.4% |
| system | 10181 | 2.6% |
| manager | 9239 | 2.4% |
| systems | 8527 | 2.2% |
| and | 7412 | 1.9% |
| health | 4934 | 1.3% |
| information | 4204 | 1.1% |
| human | 4149 | 1.1% |
| resources | 3043 | 0.8% |
| Other values (3997) | 305517 |
Most occurring characters
| Value | Count | Frequency (%) |
| 385891 | 12.0% | |
| e | 283452 | 8.8% |
| a | 240364 | 7.5% |
| n | 208685 | 6.5% |
| t | 208651 | 6.5% |
| o | 177847 | 5.6% |
| r | 173011 | 5.4% |
| i | 172283 | 5.4% |
| s | 158686 | 5.0% |
| l | 93306 | 2.9% |
| Other values (67) | 1101054 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3203230 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 385891 | 12.0% | |
| e | 283452 | 8.8% |
| a | 240364 | 7.5% |
| n | 208685 | 6.5% |
| t | 208651 | 6.5% |
| o | 177847 | 5.6% |
| r | 173011 | 5.4% |
| i | 172283 | 5.4% |
| s | 158686 | 5.0% |
| l | 93306 | 2.9% |
| Other values (67) | 1101054 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3203230 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 385891 | 12.0% | |
| e | 283452 | 8.8% |
| a | 240364 | 7.5% |
| n | 208685 | 6.5% |
| t | 208651 | 6.5% |
| o | 177847 | 5.6% |
| r | 173011 | 5.4% |
| i | 172283 | 5.4% |
| s | 158686 | 5.0% |
| l | 93306 | 2.9% |
| Other values (67) | 1101054 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3203230 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 385891 | 12.0% | |
| e | 283452 | 8.8% |
| a | 240364 | 7.5% |
| n | 208685 | 6.5% |
| t | 208651 | 6.5% |
| o | 177847 | 5.6% |
| r | 173011 | 5.4% |
| i | 172283 | 5.4% |
| s | 158686 | 5.0% |
| l | 93306 | 2.9% |
| Other values (67) | 1101054 |
job_id
Real number (ℝ)
| Distinct | 3154 |
|---|---|
| Distinct (%) | 3.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 589158.95 |
| Minimum | 469953 |
|---|---|
| Maximum | 616704 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 742.4 KiB |
Quantile statistics
| Minimum | 469953 |
|---|---|
| 5-th percentile | 536654 |
| Q1 | 580813 |
| median | 596149 |
| Q3 | 606955 |
| 95-th percentile | 614951 |
| Maximum | 616704 |
| Range | 146751 |
| Interquartile range (IQR) | 26142 |
Descriptive statistics
| Standard deviation | 24703.789 |
|---|---|
| Coefficient of variation (CV) | 0.0419306 |
| Kurtosis | 2.210657 |
| Mean | 589158.95 |
| Median Absolute Deviation (MAD) | 11966 |
| Skewness | -1.4938334 |
| Sum | 5.5975992 × 1010 |
| Variance | 6.1027717 × 108 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 592883 | 60 | 0.1% |
| 587399 | 60 | 0.1% |
| 587838 | 60 | 0.1% |
| 582993 | 60 | 0.1% |
| 543373 | 60 | 0.1% |
| 597966 | 60 | 0.1% |
| 616094 | 60 | 0.1% |
| 602801 | 60 | 0.1% |
| 607208 | 60 | 0.1% |
| 613292 | 60 | 0.1% |
| Other values (3144) | 94410 |
| Value | Count | Frequency (%) |
| 469953 | 30 | |
| 470441 | 30 | |
| 470567 | 30 | |
| 472791 | 30 | |
| 473825 | 30 | |
| 479039 | 30 | |
| 481622 | 30 | |
| 482229 | 30 | |
| 483286 | 30 | |
| 483469 | 30 |
| Value | Count | Frequency (%) |
| 616704 | 30 | |
| 616699 | 30 | |
| 616697 | 30 | |
| 616692 | 30 | |
| 616691 | 30 | |
| 616636 | 30 | |
| 616634 | 30 | |
| 616580 | 30 | |
| 616570 | 30 | |
| 616564 | 30 |
job_title
Text
| Distinct | 2179 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 742.4 KiB |
Length
| Max length | 147 |
|---|---|
| Median length | 92 |
| Mean length | 33.479949 |
| Min length | 5 |
Characters and Unicode
| Total characters | 3180930 |
|---|---|
| Distinct characters | 72 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Business Operations Analyst |
|---|---|
| 2nd row | Business Operations Analyst |
| 3rd row | Business Operations Analyst |
| 4th row | Business Operations Analyst |
| 5th row | Business Operations Analyst |
| Value | Count | Frequency (%) |
| of | 20310 | 5.2% |
| bureau | 13680 | 3.5% |
| director | 9840 | 2.5% |
| manager | 9600 | 2.4% |
| and | 9480 | 2.4% |
| health | 8100 | 2.1% |
| assistant | 7590 | 1.9% |
| analyst | 7320 | 1.9% |
| specialist | 7230 | 1.8% |
| 6120 | 1.6% | |
| Other values (1341) | 294390 |
Most occurring characters
| Value | Count | Frequency (%) |
| 395880 | 12.4% | |
| e | 229590 | 7.2% |
| i | 185880 | 5.8% |
| t | 176730 | 5.6% |
| r | 175680 | 5.5% |
| a | 175050 | 5.5% |
| n | 163860 | 5.2% |
| o | 152490 | 4.8% |
| s | 108930 | 3.4% |
| A | 84990 | 2.7% |
| Other values (62) | 1331850 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3180930 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 395880 | 12.4% | |
| e | 229590 | 7.2% |
| i | 185880 | 5.8% |
| t | 176730 | 5.6% |
| r | 175680 | 5.5% |
| a | 175050 | 5.5% |
| n | 163860 | 5.2% |
| o | 152490 | 4.8% |
| s | 108930 | 3.4% |
| A | 84990 | 2.7% |
| Other values (62) | 1331850 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3180930 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 395880 | 12.4% | |
| e | 229590 | 7.2% |
| i | 185880 | 5.8% |
| t | 176730 | 5.6% |
| r | 175680 | 5.5% |
| a | 175050 | 5.5% |
| n | 163860 | 5.2% |
| o | 152490 | 4.8% |
| s | 108930 | 3.4% |
| A | 84990 | 2.7% |
| Other values (62) | 1331850 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3180930 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 395880 | 12.4% | |
| e | 229590 | 7.2% |
| i | 185880 | 5.8% |
| t | 176730 | 5.6% |
| r | 175680 | 5.5% |
| a | 175050 | 5.5% |
| n | 163860 | 5.2% |
| o | 152490 | 4.8% |
| s | 108930 | 3.4% |
| A | 84990 | 2.7% |
| Other values (62) | 1331850 |
| job_id | similarity_score | skill_id | |
|---|---|---|---|
| job_id | 1.000 | -0.151 | -0.019 |
| similarity_score | -0.151 | 1.000 | 0.055 |
| skill_id | -0.019 | 0.055 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| similarity_score | skill_id | skill_name | job_id | job_title | |
|---|---|---|---|---|---|
| 0 | 0.83389015353728035 | 5625 | Oracle HRIS | 606346 | Business Operations Analyst |
| 1 | 0.82431666402027548 | 3603 | Human resources management system HRMS | 606346 | Business Operations Analyst |
| 2 | 0.82425782201016473 | 5682 | OrangeHRM | 606346 | Business Operations Analyst |
| 3 | 0.82180843221890232 | 3602 | Human resource management software HRMS | 606346 | Business Operations Analyst |
| 4 | 0.81899765731401342 | 3601 | Human resource information system HRIS | 606346 | Business Operations Analyst |
| 5 | 0.81785952968766662 | 5651 | Oracle PeopleSoft Enterprise Human Resources | 606346 | Business Operations Analyst |
| 6 | 0.81739794236492302 | 5613 | Oracle E-Business Suite Human Resources Management System | 606346 | Business Operations Analyst |
| 7 | 0.81655120417300819 | 1873 | Consultants in Data Processing HRnet | 606346 | Business Operations Analyst |
| 8 | 0.81490089810122257 | 4341 | Lawson Human Resource Management | 606346 | Business Operations Analyst |
| 9 | 0.8137670379005908 | 5654 | Oracle PeopleSoft Human Capital Management | 606346 | Business Operations Analyst |
| similarity_score | skill_id | skill_name | job_id | job_title | |
|---|---|---|---|---|---|
| 95000 | 0.77717135700741291 | 7218 | Softrail AEI Rail & Road Manager | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95001 | 0.77629480951944985 | 2282 | Digital Crew Teamwork Project Manager | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95002 | 0.7747333213877996 | 5996 | PlanGraphics Citywide GIS Utility | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95003 | 0.77442286429250728 | 7740 | Texas Transportation Institute TTI Progression Analysis and Signal System Evaluation Routine PASSER | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95004 | 0.77440746725609322 | 5210 | Municipal geographic management software | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95005 | 0.77424792870754899 | 3168 | GEOCOMtms A.Maze Planning | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95006 | 0.77414005781220974 | 6674 | Route planning software | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95007 | 0.77373875493571664 | 8265 | Vision Management Consulting IEP PlaNET | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95008 | 0.77370806262038472 | 261 | Adaptive Planning | 561563 | Deputy Director - Long-Range Planning and Policy |
| 95009 | 0.77339079021347656 | 7910 | Total Officer Personnel Management Information System TOPMIS | 561563 | Deputy Director - Long-Range Planning and Policy |